Inflammatory Bowel Diseases
◐ Oxford University Press (OUP)
Preprints posted in the last 30 days, ranked by how well they match Inflammatory Bowel Diseases's content profile, based on 15 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.
Harris, D. M. M.; Bourgonje, A. R.; Braadland, P. R.; McShane, C.; Welz, L.; Waschina, S.; Ibing, S.; Tran, F.; Sands, B. E.; Dubinsky, M.; Suarez-Farinas, M.; Ueland, P. M.; McCann, A.; Detlie, T. E.; Bengtson, M.-B.; Kristensen, V.; Franke, A.; Colombel, J.-F.; Rosenstiel, P.; Croitoru, K.; Sokol, H.; Turpin, W.; Hov, J. R.; Hoivik, M. L.; Ungaro, R. C.; Schreiber, S.; Aden, K.
Show abstract
BackgroundTryptophan (Trp) metabolism is a central immunometabolic axis in inflammatory bowel disease (IBD) and has been linked to inflammatory activity and immune regulation. While individual Trp metabolites have been associated with disease severity and treatment response, systems-level frameworks to define metabolic subtypes in IBD are lacking. ObjectiveTo identify reproducible Trp-related metabolic subtypes ("metabotypes") in IBD and assess their association with disease activity, clinical outcomes, and early disease development. DesignWe applied unsupervised clustering to serum concentrations of 16 Trp-related metabolites in a discovery cohort of patients with IBD undergoing biologic induction therapy (n=134). Metabotypes were validated in three independent IBD cohorts (total n>2,800), a healthy reference population, and a prospective cohort of first-degree relatives at risk for Crohns disease. Associations with disease activity, longitudinal outcomes, and metabolic pathways were assessed using multivariable regression and survival analysis. ResultsFour reproducible metabotypes with distinct metabolite profiles were identified across cohorts: Low Kyna, High Kyna, High Quin, and Balanced. Low Kyna and High Quin metabotypes were consistently associated with increased inflammatory activity and adverse clinical outcomes, including increased risk of treatment escalation and disease progression. Pathway-level analyses revealed alterations in NAD-related, lipid, and amino acid pathways between inflammatory metabotypes. A metabotype resembling inflammatory disease states was enriched in individuals who later developed Crohns disease in a prospective pre-disease cohort. ConclusionTrp-linked metabotypes define reproducible immunometabolic states in IBD that associate with disease activity and clinical outcomes and may precede disease onset. These findings provide a framework for metabolic stratification and biomarker-guided clinical trials targeting immunometabolic pathways. What is already known on this topicTryptophan metabolism through the kynurenine pathway is a central immunometabolic axis in inflammatory bowel disease (IBD) and has been linked to inflammatory activity and immune regulation. Individual tryptophan metabolites have been associated with disease severity and treatment response, but their clinical utility for patient stratification remains limited. Systems-level approaches to define clinically meaningful metabolic subtypes in IBD are lacking. What this study addsWe identify four reproducible tryptophan-related metabolic subtypes ("metabotypes") that are consistently associated with disease activity across multiple independent IBD cohorts. Inflammation-associated metabotypes show distinct pathway-level alterations, including differences in NAD-related metabolism and broader metabolic programs. A metabotype resembling inflammatory disease states is detectable before clinical diagnosis in individuals who later develop Crohns disease. How this study might affect research, practice or policyMetabotype-based classification provides a framework for molecular stratification of patients in mechanistic studies and clinical trials targeting immunometabolic pathways. This approach may support biomarker-guided monitoring of disease activity and disease progression in IBD. Identification of preclinical metabolic states highlights the potential of metabolomics for early disease detection and prevention-oriented research strategies.
El Hajj, Y.; Slater, R.; Probert, C.; Tang, G.; Abreu, M. T.; Mishra, N.; Haglund, S.; Schreiber, S.; Hegazy, A. N.; Almer, S.; Rosenstiel, P.; Lyons, P. A.; Subramanian, S.
Show abstract
BackgroundVedolizumab, a gut-selective anti-integrin therapy, is effective in IBD, but response rates remain variable. Conventional clinical and biochemical markers, including C-reactive protein and faecal calprotectin, have limited predictive value. Although recent transcriptomic studies have implicated T-cell-related signatures in predicting vedolizumab response, these findings lack validation across independent cohorts. MethodsWe analyzed pre-treatment transcriptomic profiles from whole blood and T-cell subsets across five independent cohorts comprising 100 patients with UC and CD. The primary outcome was clinical response. Secondary outcomes included clinical and biochemical remission. ResultsAmong the 100 patients, 61 were responders and 39 non-responders, with no significant baseline clinical differences. Gene set enrichment analyses revealed downregulation of interferon alpha and gamma signalling in responders baseline blood samples, a finding validated across independent cohorts. Downregulated interferon signalling at baseline was also observed in patients who achieved clinical and biochemical remission. To build a predictive model, an adaptive elastic net logistic regression model was applied to baseline whole-blood RNA-sequencing data. The classifier achieved an AUC of 1.0 in training, 0.71-0.83 in UC validation cohorts, and 0.64-1.0 in CD cohorts. Reduced interferon signalling was observed across CD4{square} and CD8{square} T-cell subsets, including regulatory T cells, suggesting a broad immune signature rather than cell-type specificity. ConclusionsDownregulated interferon signalling in peripheral blood prior to treatment is a reproducible molecular signature predictive of vedolizumab response and biochemical remission. Whole-blood transcriptomics revealed a robust interferon-axis signal that predicted vedolizumab response across independent cohorts, with stronger performance in UC than CD. Given heterogeneous clinical endpoints and assessment windows, these data provide proof-of-concept that warrants validation with standardised, endoscopy-based outcomes.
Basson, A. R.; Katz, J.; Nguyen, V.; Singh, D.; Menghini, P.; Gomez-Nguyen, A.; Sieg, J.; Bell, M.; Thamma, K.; Ponzani, G.; Osme, A.; Rodriguez-Palacios, A.; Cominelli, F.
Show abstract
Background and Aims: Diet plays a critical role in managing Crohns disease (CD) inflammation. We assessed whether dietary replacement of animal protein (AnimalP) by soy-pea protein (SoyP) decreases the pro-inflammatory potential of gut microbiota and intestinal inflammation in CD patients. Design: In an open-label, randomized controlled feeding trial at University Hospitals Cleveland Medical Center, CD participants and healthy controls were randomized (1:1) to a soy-pea or animal protein diet for 7-days. Primary outcomes were the absolute difference (d7-d0) in; Crohns Disease Activity Index (CDAI) score and fecal myeloperoxidase (MPO). Secondary outcomes included fecal calprotectin (FC) and high-sensitivity C-reactive protein (hsCRP). Murine fecal transplantation experiments were performed to determine the inflammatory potential of diet-altered gut microbiota. Results: The study randomized 66 participants and 60 were included in the final analysis (n=31 CD, n=29 HC). After 7 days, CD-SoyP participants were more likely than CD-AnimalP to show reductions in HBI (RR=4.68, 95% CI: 1.22-17.98, P=0.009) and fecal MPO (RR=2.30, 95% CI: 1.04-4.85, P=0.032), with a similar directional trend for CDAI (RR=1.52, 95% CI: 0.89-2.58, P=0.135). No participants experienced worsening of CDAI. The rank-based composite CDAI-MPO score was lower in the CD-SoyP vs CD-AnimalP group (median [IQR]: 5 [4-6] vs 8 [7-9]; P=0.012). Stratified analyses showed significant reductions in fecal MPO among CD participants with lower baseline disease activity (CDAI <150; P<0.0001), but not in those with higher activity (P=0.799) Conclusion: Short-term addition of plant-based soy-pea protein within a controlled diet exerted a beneficial, anti-inflammatory effect in CD, with evidence of greater effects among participants with lower baseline disease activity. ClinicalTrials.gov, Number NCT04065048.
Chen, J.; Li, A.; Wu, W.; Xu, W.; Zhao, T.; Starkweather, A. R.; Rodriguez, L.; Chen, M.-H.; Cong, X. S.
Show abstract
Background: Heterogeneity in symptom presentation and treatment response in irritable bowel syndrome (IBS) remains poorly understood. The gut microbiota may contribute to this variability, but its role in shaping symptom trajectories and responses to self-management interventions is unclear. Objective: To identify symptom trajectory phenotypes and determine whether gut microbiota composition and function distinguish these phenotypes and predict multidimensional responses to pain self-management interventions in young adults with IBS. Design: Ancillary data analysis from a randomized control trial (NCT03332537). Methods: Participants with longitudinal data (n = 62) were analyzed using longitudinal k-means clustering (KML) based on trajectories of measures in IBS quality of life (QOL), Brief Pain Inventory (BPI), and psychoneurological outcomes (anxiety, applied cognition, depression, fatigue, global health, positive affect, and sleep disturbance) over 12 weeks. Baseline differences between clusters were assessed with Wilcoxon rank-sum tests, and longitudinal changes were evaluated with linear mixed models. Gut microbiota composition and predicted functional pathways were compared between phenotypes. Bayesian Additive Regression Trees (BART) models were used to identify baseline microbial taxa and pathways predictive of longitudinal changes in QOL, BPI pain interference, and severity. Results: Two distinct trajectory-defined response phenotypes were identified: a Constrained Response Phenotype (Phenotype A, n = 35) and an Adaptive Multidomain Response Phenotype (Phenotype B, n = 27). At baseline, Phenotype B showed lower pain severity and interference, but higher levels of anxiety, depression, and fatigue compared to Phenotype A. Over 12 weeks, both phenotypes showed improvements in pain outcomes (all p < 0.05), but only Phenotype B demonstrated broad improvements across psychoneurological domains and QOL (all p < 0.05). Phenotype A exhibited more limited improvements and worsening in several psychoneurological domains. Gut microbiota functional pathways differed between phenotypes, including pathways related to xenobiotic degradation, amino acid metabolism, bile secretion, and immune-related processes (all raw p < 0.05), although these did not remain significant after multiple testing correction. Machine learning models identified distinct, phenotype-specific microbial predictors of intervention response. In Phenotype A, genera such as Alistipes and Sutterella were consistently identified across models, whereas in Phenotype B, predictors included Phascolarctobacterium, Collinsella, and Parabacteroides. Functional pathways also differed between phenotypes, suggesting distinct microbiome-linked mechanisms underlying symptom trajectories and responses to pain interventions. Conclusions: Young adults with IBS exhibit distinct multidimensional response phenotypes that are associated with differential clinical and microbiome profiles. Baseline gut microbiota composition and functional capacity demonstrate phenotype-specific predictive signatures of treatment response, supporting a microbiome-informed framework for stratifying patients and advancing personalized self-management strategies in IBS.
Hawkins, R. L.; Cotterill, C.; McCormick, S.; Kellar, I.; Lobo, A. J.; Sampson, F. C.
Show abstract
Background Unplanned hospital admissions in Inflammatory Bowel Diseases (IBD) account for nearly three-quarters of IBD inpatient stays in the United Kingdom. Although costly to services and distressing for patients, research exploring experiences and potential drivers of admissions is limited. We undertook a qualitative study to explore the healthcare experiences and access needs of people with IBD who had unplanned admissions, along with their caregivers and clinicians. Methods Semi-structured interviews with 25 participants from a single tertiary IBD service in England (17 people with IBD, 3 informal caregivers, 5 clinicians) were conducted. We applied thematic framework analysis, guided by the Candidacy Framework, and worked with 2 patient and public contributors to generate final themes. Results We identified four themes: 1) Difficulties in Identifying flares and asserting severity before admission, summarised the prevailing uncertainty in identifying a flare and access to timely IBD care. 2) Navigating a disjointed healthcare system, highlighted how lack of care plans and systemic barriers can delay access. 2) Emergency care access challenges highlighted the gaps in emergency and inpatient care during flares. Whilst 4) fighting for care and individual advocacy needs, described the persistent assertion for care that may disproportionally impact access to vulnerable groups, also highlighting the importance of positive interpersonal relationships. Conclusions Individual, interpersonal and healthcare factors across the patient pathway were perceived to shape access to care in unplanned IBD admissions. Potentially reducing admissions requires proactive strategies, including the integration of patient education, monitoring tools, establishment of specialist rapid-access pathways, and formal psychological support to address barriers to access.
Rifkin, S.; Markham, N. O.; Anderson, S. M.; Wilson, O.; Shrubsole, M.; Sears, C. L.; Rao, K.
Show abstract
Background Recent mouse model data demonstrate that chronic colonization with toxigenic Clostridioides difficile promotes colonic tumorigenesis via intraluminal toxin B (TcdB), its main virulence factor. In a prior multisite hospital cohort, we found that history of positive tcdB stool testing was associated with increased CRC risk in a dose-dependent manner, though limited by small sample size. We aimed to validate this association in a larger cohort with extended follow-up and greater geographic distribution using the Veterans Health Administration (VHA) Corporate Data Warehouse (CDW). Methods We conducted a retrospective cohort study among adults receiving care through the VA from 2000-2025 who underwent C. difficile testing. Data collected from the VHA CDW and National Death Index (NDI) included demographics, comorbidities, medications, CRC risk factors, and cancer incidence and death. The first C. difficile test date defined cohort entry; individuals with prior CRC were excluded. Ever C. difficile positivity was defined by a positive PCR or EIA results. The number of positive tests (episodes) was also determined to define recurrent positivity. Follow-up time ended at the first occurrence of CRC incidence or mortality, death from other causes, or censor date. Follow-up time was split for individuals who converted from negative to positive, with follow-up time updated accordingly. Multivariable Cox proportional hazards models were used to estimate hazard ratios (HRs) for C. difficile exposure and CRC incidence and mortality after adjustment for confounders. Tests for linear trend and tests for interaction were conducted to assess effect modification by sex and IBD status, while time-lag intervals were evaluated for 1, 3, 5, and 10 years before the outcome. Results Among 806,844 veterans with C. difficile testing, those with positive tests were more likely to be older, male, to have diabetes, to use aspirin, and to have a lower BMI than those with negative tests. Race and IBD prevalence were similar between the groups. There was no overall association between ever C. difficile positivity and CRC incidence (HR = 0.99, 95% CI 0.93-1.05). However, recurrent C. difficile positivity was associated with increased risk in a dose-response manner [2-3 episodes HR = 1.30 (95% CI 1.16-1.47), and >3 episodes HR = 1.58 (95% CI 1.17-2.14) compared to negative tests; ptrend< 0.001]. Further, ever C. difficile positivity was associated with increased CRC mortality risk (HR = 1.21, 95% CI 1.13-1.30; p < 0.001). Recurrent C. difficile positivity was associated with increased mortality risk but was particularly strong for those with >3 episodes among individuals with IBD (HR=3.84, 95% CI 1.98-7.45). In sensitivity analyses, the increased risk of CRC incidence and mortality attenuated beyond 10 years. Conclusion Prior positive C. difficile testing was associated with increased CRC incidence and mortality in a dose-dependent manner, particularly among patients with IBD. These findings extend animal model evidence, epidemiologically establishing C. difficile presence as an independent risk factor for subsequent colorectal tumorigenesis and supporting investigation into recurrent CDI, especially among patients with IBD, as a potential modifiable CRC risk factor.
Wu, P.; Yang, J.; Xian, Z.; Zhong, W.; Lu, L.
Show abstract
BackgroundThis study evaluated the safety and efficacy of primary resection and anastomosis (PRA) for colovesical fistula (CVF) of diverse etiologies and identified independent prognostic factors for oncological outcomes. MethodsWe retrospectively analyzed 112 CVF patients (2017-2024) undergoing PRA with or without a defunctioning stoma, comparing clinical outcomes across benign and malignant cohorts. ResultsBenign etiologies accounted for 33.0% (n=37) (colonic diverticulitis (n=19, 51.4%), Crohns disease (n=14, 37.8%), and iatrogenic injury (n=4, 10.8%)), all underwent PRA with partial cystectomy, achieving zero mortality and no recurrence. Malignancies (67.0%) primarily included colorectal adenocarcinoma (sigmoid colon cancer (n=44, 58.7%) or rectal cancer (n=31, 41.3%)). Within the malignant cohort, radical cystectomy (n=15) was strictly necessitated by advanced disease features, including distal tumor location and extensive bladder wall invasion (80.0% vs 36.7%, P=0.003). Consequently, this advanced cohort experienced longer operative times (589 vs. 289 min), higher blood loss (600 vs. 100 mL), increased morbidity (80.0% vs. 20.0%, P<0.001), and shorter disease-free survival (DFS) (8 vs. 20 months, P=0.008) compared to those amenable to partial cystectomy (n=60). Crucially, multivariate analysis identified perineural invasion (PNI) (HR: 3.83, 95% CI: 1.49-9.84; P=0.005) as a critical independent predictor of recurrence, reflecting the impact of tumor biology over surgical extent. ConclusionsPRA is a definitive and versatile strategy for CVF. In malignant cases, bladder-preserving strategies are oncologically viable when R0 margins are achievable. Integration of PNI status and neoadjuvant therapy was essential for refining personalized multidisciplinary management.
Sun, Y.; Jiang, Z.; Dan, L.; Qian, Y.; Wellens, J.; Yao, J.; Li, X.; Wang, X.; Magro, F.; Chen, Y.; Chen, J.
Show abstract
Objectives: The Mediterranean-DASH Intervention for Neurodegenerative Delay (MIND) diet has been associated with the risk of IBD, but its impact on clinical outcomes is uncertain. This study evaluated the association between MIND diet adherence and the risk of IBD-related surgery in a prospective cohort. Methods: This study included 2,288 participants with diagnosis of Crohn's disease (CD, n=777) or ulcerative colitis (UC, n=1,511) who completed valid WebQ 24-hour dietary recall from the UK Biobank. Dietary adherence was derived from a 15-component score based on 24-hour dietary recalls. Associations with IBD-related surgery were evaluated using Cox proportional hazards models, with nonlinear trends and examined via restricted cubic splines. Effect modification was explored in pre-specified subgroups, and multiple sensitivity analyses were conducted to assess robustness. Results: During 10.9 years of follow-up, 166 incident IBD-related surgery cases occurred. Higher MIND diet adherence was associated with reduced surgical risk. Compared with the lowest tertile of adherence, the highest tertile showed a 36% reduction in surgical risk in IBD (HR 0.64, 95% CI: 0.44-0.94, P = 0.024). Notably, this protective effect was pronounced in patients with CD, exhibiting a clear linear inverse association. In contrast, a reverse J-shaped association was observed in UC, with a steep initial decline in surgical risk followed by a plateau emerging at a MIND score of approximately 5, beyond which further adherence conferred minimal additional benefit. At the component level, higher vegetable consumption and lower intake of butter and fried foods were identified as independent protective factors against surgery. Stronger inverse associations were observed among patients with shorter disease duration and those with complicated disease behavior, including stricturing or penetrating phenotypes (all P interaction < 0.05). Conclusion: Greater MIND diet adherence is associated with reduced IBD-related surgery risk among patients with IBD and CD. These findings support the MIND diet as a feasible dietary strategy to improve IBD prognosis.
Chuah, C. S.; Gros, B.; Plevris, N.
Show abstract
ObjectivesTo describe the design, operational safeguards, and early use of ChatIBD, a specialty-specific generative AI platform for inflammatory bowel disease (IBD), during its first 6 months of live deployment. MethodsChatIBD is an online question-answering platform that uses retrieval-augmented generation over a curated corpus of IBD guidelines. Queries undergo hybrid semantic and keyword retrieval with query expansion and reranking, and the model is instructed to answer only from retrieved material and return linked citations. Safeguards include fixed medication dosing information from European Medicines Agency (EMA), user feedback capture, and clinician review of flagged outputs. We performed a descriptive service evaluation of aggregated, de-identified platform metrics collected between 1 October 2025 and 1 April 2026. ResultsDuring the study period, ChatIBD registered 913 users and processed 7,222 messages across 3,855 conversations. Activity was recorded across 69 countries and 28 languages, with the highest message volumes from the United Kingdom (27.1%) and Spain (12.3%). Median daily message volume was 35.5 (IQR 20 to 52), and 85.1% of messages were submitted on weekdays. Medication-related queries accounted for the largest use domain, while guideline synthesis was the most frequent inferred intent. Sixteen explicit feedback events were recorded, including one negative rating that triggered clinician review and system changes. ConclusionsChatIBD showed early international uptake and repeat use as a specialty-specific, retrieval-grounded generative AI tool for IBD professionals. These findings support the feasibility of deploying a guideline-grounded clinical AI service with practical safeguards, but do not establish response accuracy, safety, or clinical effectiveness. Formal validation is in progress. What is already known on this topicGeneral-purpose large language models are increasingly being used informally by clinicians, but concerns remain about hallucinated content, unverifiable recommendations, and poor traceability to specialty-specific sources. What this study addsThis study describes the early deployment of ChatIBD, a specialty-specific retrieval-grounded generative AI tool for IBD professionals, and the safeguards used in its live operation. How this study might affect research, practice or policyEarly evaluations of live specialist AI tools may help guide governance, implementation, and validation. Uptake alone is not evidence of effectiveness, but it can help shape priorities for subsequent studies.
Santelices, J.; Schaefer, Z.; Gachunga, W.; Celeste, C.; Parker, I. K.
Show abstract
BackgroundTrained immunity is a durable functional reprogramming of innate immune cells characterized by enhanced responsiveness upon secondary challenge. While metabolic rewiring and epigenetic remodeling are well-established features of this process, the contribution of ubiquitin-mediated post-translational regulation remains poorly defined. MethodsWe performed an integrative analysis of publicly available human transcriptomic datasets derived from monocytes, macrophages, and PBMCs exposed to established training stimuli ({beta}-glucan, Bacillus Calmette-Guerin [BCG], and hemin-{beta}-glucan) followed by secondary stimulation. A curated panel of deubiquitinating enzymes (DUBs) and E3 ubiquitin ligases with established immune functions was analyzed for differential expression. Gene Ontology (GO) and KEGG pathway enrichment analyses were conducted to evaluate higher-order convergence across independent datasets. ResultsAcross multiple trained immunity models, we identified reproducible transcriptional remodeling of ubiquitin-modifying enzymes. USP25, OTUB1, and TRIM25 were consistently upregulated following restimulation, whereas several chromatin- and cytokine-regulatory DUBs--including USP3, USP4, USP7, USP16, MYSM1, and USP38--were downregulated. Normalization to RPMI-restimulated controls reduced many activation-associated signals; however, USP25 remained persistently elevated, suggesting a stable training-associated signature. Pathway enrichment analysis independently demonstrated significant engagement of ubiquitin-related functional categories across datasets, supporting coordinated reorganization of ubiquitin regulatory networks. ConclusionThese findings identify selective transcriptional remodeling of the ubiquitin- proteasome system as a recurring feature of trained immunity. Integrating ubiquitin signaling into the established metabolic-epigenetic framework expands the conceptual model of innate immune memory and suggests that ubiquitin-modifying enzymes function as modulatory rheostats shaping immune amplitude and stability. Future functional and proteomic studies are required to determine whether these transcriptional signatures directly mediate trained immunity phenotypes.
Stendahl, A.; Yu, J. X.; Jazrawi, S.; Jonica, E.; Rodriguez, J.; Javia, S.; Sharzehi, K.; Cote, G.
Show abstract
Background and Study Aims Fully covered, self expandable metal stents (FCSEMS) are used to treat biliary strictures. FCSEMS with transmural side holes may facilitate cystic duct drainage to mitigate risk of cholecystitis and impact other stent-related adverse events such as migration and occlusion. This study compared rates of premature stent occlusion and acute cholecystitis among patients with biliary strictures who underwent first time placement of a FCSEMS with or without transmural side holes. Patients and Methods This was a retrospective cohort study of adults who underwent endoscopic retrograde cholangiopancreatography (ERCP) with FCSEMS between April 2022 to April 2025 for malignant or benign extrahepatic bile duct strictures. Patients were followed for a minimum of 9 months or through planned stent removal. The primary outcome was premature bile duct occlusion. The secondary outcome was acute cholecystitis among patients with an intact gallbladder. Results Among 219 patients meeting enrollment criteria, 57 (26%) had side holes. The rate of premature stent occlusion was similar with transmural side holes (12%) vs. without (11%, HR 1.02, 95% CI 0.42 2.43, p = 0.96). Among patients with an intact gallbladder (n=129), acute cholecystitis rates were similar with side holes (6%) or without (4.8%, HR 1.01, 95% CI 0.22 4.5, p = 0.99). Conclusions FCSEMS stents with side holes do not reduce rates of premature bile duct stent occlusion or acute cholecystitis compared to FCSEMS without side holes.
Steininger, H. M.; Iglesias-Aguirre, C. E.; Panzer, A. R.; Durack, J.; McKean, M.; Cabana, M. D.; Diamond, S.; Lynch, S. V.
Show abstract
2.Childhood atopic disease is linked to delayed gut microbiome development and metabolic dysfunction, however microbial drivers remain unclear. To explore microbial correlates of asthma risk during a time of active gut microbiome development, we analyzed stool from 6-month-old infants at high asthma risk (HR) or healthy controls (HC), using Genome-resolved metagenomics (HR=7; HC=12) and untargeted metabolomics (HR=11; HC=15). We recovered 82 bacterial species-level metagenomic-assembled genomes (MAGs). Global Taxonomic composition did not differ by asthma risk. Anticipating that key differences might associate with specific genomes, a machine-learning approach pinpointed Bacteroides cellulosilyticus, Hungatella effluvii, and Enterocloster aldenensis as linked with asthma risk status. All three species were more abundant in HC infants and the B. cellulosilyticus genome was enriched for carbohydrate metabolism genes relative to other MAGs. Metabolomic profiling revealed variance associated with asthma risk (PERMANOVA, R2 =0.069, p=0.016). HR fecal metabolomes were enriched in simple sugars, whereas HC contained more nitrogenous compounds. Integrative genome-metabolic modeling of compounds that significantly differentiate asthma-risk groups revealed risk-dependent interactions with community-encoded metabolic potential (CEP), for arabinose and agmatine, whose fecal concentrations are linked with B. cellulosilyticus and H. effluvii functional traits respectively. These findings suggest that microbial-influenced metabolic differences associate with asthma risk at 6 months, with B. cellulosilyticus and H. effluvii emerging as candidate bacteria influencing this observed metabolic remodeling. 3. Impact statementLeveraging a random forest classifier, we identified three bacterial species (Bacteroides cellulosilyticus, Hungatella effluvii, and Enterocloster aldenensis) as distinguishing features enriched in healthy 6-month old infant microbiomes compared to those at high risk of asthma development (HR). We developed an approach to integrate metabolomics and metagenomic-derived microbiome community encoded potential (CEP) with clinical outcomes to identify fecal metabolites whose concentrations are likely to be influenced by the microbiome. Fecal arabinose concentrations were positively associated with CEP in healthy infants, but not in HR subjects who exhibited elevated concentrations irrespective of CEP. These data implicate microbial activity as a contributor to the concentration of this metabolite in healthy but not HR infants. With a leave-one-out-cross-validation, we identified B. cellulosilyticus as a contributor to fecal arabinose concentrations. Our data indicate that microbial functional deficits in HR infants is associated with altered gut metabolic dysfunction during microbiome maturation. 4. Data summaryDurack et. al [1] is the source of the metabolomics data utilized in this study. The authors confirm that all other supporting data, code and protocols have been provided within the article or through supplementary data files.
Wittlinger, S.; Meerjansen, J.; Wolf, F.; Wiest, I. C.; Ebert, M. P.; Siegel, F.; Belle, S.
Show abstract
ObjectiveStructured extraction from clinical free-text depends on human annotators whose labels are susceptible to errors and knowledge-driven mistakes; exhaustive quality control is impractical at scale. We evaluate whether disagreement among multiple locally hosted large language models (LLMs) can prioritize human annotations for targeted review. MethodsMultiple LLMs independently extract the same set of structured variables annotated by a human reviewer. For each annotation, an agreement score counts the LLMs matching the human label. Using four locally hosted LLMs (Gemma 3 27B, DeepSeek-R1 70B, GPT-OSS 120B, Mistral Large 3), we evaluated this approach on 910 German-language colonoscopy reports describing endoscopic mucosal resection, with five structured variables per case (anatomical location, two diameters, resection technique, multiple polyps), yielding 4,550 annotations and a 377-case adjudication sample. A stratified sample oversampling low-agreement strata was adjudicated blinded by an experienced reviewer and analyzed with prevalence-adjusted estimates ResultsHuman error rates rose as LLM agreement fell, from 0% at scores 3-4 to 76% at score 0. The lowest-agreement stratum was only 6.5% of annotations yet concentrated an estimated 80% of errors. The multi-LLM disagreement score achieved a prevalence-adjusted AUC-ROC of 0.991 (95% CI 0.987-0.994) and AUC-PR of 0.893 (95% CI 0.851-0.929) for error detection. DiscussionMulti-LLM disagreement outperformed single models and provided graded operating points for risk-stratified review. ConclusionMulti-LLM disagreement provides a scalable quality-control signal for targeted review of the highest-yield cases. Because all models run locally, the framework is GDPR-compliant; its language- and task-agnostic design supports application across clinical domains.
Krausz, M.; Zhao, B.; Mrovecova, P.; Proietti, M.; Grimbacher, B.
Show abstract
BackgroundCTLA-4 haploinsufficiency (CHAI) and LRBA deficiency cause severe immune dysregulation including enteropathy. Abatacept, a CTLA-4-immunoglobulin fusion protein, targets the underlying pathway defect, but its impact on the gut microbiome remains undefined. MethodsWe performed longitudinal shotgun metagenomics (MetaPhlAn4/HUMAnN3) on stool samples from patients enrolled in the ABACHAI clinical trial, collected at pre-treatment baseline and months 3, 6, and 12. Healthy individuals from the same household served as controls. Compositional and functional microbiome changes were analyzed using linear mixed-effects models and MaAsLin3, and correlated with organ-specific CHAI Morbidity Scores. ResultsAt baseline, patients showed significantly reduced alpha diversity (Shannon index, p=0.0029) and distinct community composition (PERMANOVA p=0.0001) compared to healthy controls, characterised by enrichment of oral-associated taxa (Veillonella, Streptococcus, Lacrimispora) and depletion of butyrate-producing commensals (Ruminococcus, Oscillibacter, Dysosmobacter). Functionally, the baseline metagenome exhibited broad reductions in amino acid and SCFA biosynthesis alongside enrichment of purine salvage and folate pathways. During treatment, beta diversity shifted significantly with treatment duration (Aitchison PERMANOVA R2=0.103, p=0.015), with within-patient community turnover peaking at month 6 ({Delta}=0.216, p=0.006). Longitudinal analyses demonstrated progressive decreases in disease-enriched taxa (Veillonella, Lacrimispora) and recovery of commensals (Collinsella, Adlercreutzia). FDR-significant reductions in microbial folate and purine biosynthesis pathways were observed over the treatment course. Gut CHAI domain severity correlated inversely with butyrate-producer abundance and positively with oral taxon enrichment. ConclusionIn CTLA-4 pathway insufficiency patients, abatacept therapy is associated with an improvement of enteropathy and a progressive, measurable gut microbiome restructuring, positioning microbiome dynamics as a candidate biomarker of treatment response in this monogenic immune dysregulation disorder.
LIn, H.-M.; Lyu, J.; Wang, I.-L.
Show abstract
Background: Hospital incident risk scoring has long relied on two- or three-dimensional frameworks (Severity Assessment Codes or Risk Priority Numbers),even though root cause analysis standards recognize that clinical risk is multi-factorial. The obstacle has been mainly cognitive: human reviewers cannotreliably score many dimensions across high incident volumes, so richer assessmenthas not been operationalized at scale.Objective: To extend the traditional three-dimensional FMEA to an eight-dimensional patient-safety risk feature framework, to establish a multi-modellarge language model (LLM) extraction pipeline that scores these dimensionsautomatically, and to demonstrate a variance-aware integer optimization (mean-variance integer programming, MV-IP) that provides a reproducible tie-breakingrule for incident prioritization under extraction uncertainty, rather than improvedrisk coverage.Methods: An 8-dimensional framework covering harm severity, potential harm,frequency, detectability, systemic impact, vulnerable populations, regulatoryrelevance, and economic impact was applied to 213 synthetic and 196 realcurated incident narratives. Three independent LLMs (GPT-5.4, Gemini 3.1 Pro, Grok-4.1 Fast) from different provider families extracted structured risk scores.Inter-model consistency was assessed via ICC(A,1). Among coverage-equivalentselections, MV-IP minimized inter-model variance to give a reproducible prioriti-zation rule. An English-language sensitivity analysis was conducted on 31 AHRQPSNet WebM&M cases.Results: On real cases, seven of eight dimensions reached Fair or betterinter-model reliability (ICC(A,1) 0.53 to 0.83); D5 (Systemic Impact) was theexception at Poor reliability (0.275), driven by little between-case variation ratherthan by wide model disagreement. Reliability was not uniform: two dimensionswere Excellent (D1 actual harm 0.834, D8 economic impact 0.782), two Good,and three only Fair, so some dimensions are more readily extractable than others.The same anchors gave broadly similar results on English-language narratives.When deterministic top-K selection returned several equal-coverage solutions(11 on real cases, total inter-model variance 0.205 to 1.274), MV-IP selected theminimum-disagreement set, replacing ad hoc tie-breaking with an explicit rulewithout improving coverage. Bootstrap resampling found 74% to 90% of per-casevariance estimates stable despite the three-model panel.Conclusions: The eight-dimensional framework operationalizes patient-safetyrisk features that quality teams have considered only implicitly, and three inde-pendent LLM families produced reproducible scores on most dimensions ofcurated narratives. Inter-model agreement, however, measures reproducibilityrather than clinical correctness, and high agreement does not by itself establishthat a score is right; the dimensions that are reliably extractable today (notablyD6 and D8) differ from those that are not yet (D5, and to a lesser degree D4 andD7), which has direct implications for incident-reporting form design. MV-IP con-tributes a reproducible, variance-aware tie-breaking rule rather than improvedcoverage. Validation against expert-prioritized RCA lists and deployment on rawinstitutional incident reports remain the next steps toward clinical use.
Roy, J.; Korleski, J. B.; Augustin, R. C.; Yefet, L.; Jensen, Z. D.; Ehman, E. C.; Zadeh, G.; Conners, A. L.; Tevaarwerk, A. J.; Korfiatis, P.
Show abstract
Background: Preparing tumor board patient summaries is time intensive. Large-language-model based systems may automate summarization but require real-world evaluation prior to clinical use. We performed an exploratory retrospective evaluation of the Microsoft Healthcare Agent Orchestrator (HAO), deployed in a Mayo Clinic controlled staged environment, to generate tumor board-style patient summaries from retrospective Electronic Health Record (EHR) notes. Methods: HAO generated summaries for breast, hepatobiliary, and neuro-oncology tumor board cases using up to the most recent 1,000 clinical notes. Clinician reviewers evaluated outputs via REDCap surveys across perceived factuality, completeness, clarity/conciseness, temporal cohesion, comparative performance, safety, and clinical utility (0-4 Likert scale). Reviewers were permitted to query the HAO chat interface to address missing details. Automated factuality was assessed using TBFact (bidirectional entailment), reporting precision and recall against available reference summaries. Results: Among 57 survey responses from 5 different physicians, mean scores exceeded 2.8 across domains, with medians of 3 for most axes. In an exploratory comparison, oncology fellows required less time to review HAO-generated summaries than to manually generate patient summaries (mean difference 13.57 minutes per patient, p<0.001), although this difference may be influenced by prior familiarity with the same cases; 96% of survey responses indicated that HAO would save time. TBFact evaluations showed higher recall than precision across domains, consistent with broad capture of reference content alongside additional content that was not present in gold-standard summaries. Attribution was viewed favorably but showed issues with primary-source specificity and link reliability. Conclusions: In a controlled Mayo environment, HAO demonstrated moderate performance and was associated with reduced review time for tumor board preparation. These findings are promising but preliminary and do not establish clinical safety, noninferiority to manual review, or readiness for routine clinical use. Limitations, including verbosity, specialty-specific content gaps, and inconsistent attribution, highlight the need for iterative refinement and further evaluation.
Wang, Y.; He, H.; Zhu, R.; Lu, Y.; Phadungsaksawasdi, P.; Peng, M.; Liu, Z.; Zou, K.; Zhang, Y.; Chew, S. P.; Tham, Y. C.; Khorasani, A.; Deng, H.; Cheng, C.-Y.; Yang, J.; Liu, D.
Show abstract
Background Patients worldwide receive healthcare in many languages, yet medical AI systems are validated almost exclusively in high-resource languages such as English and Chinese, exposing patients in other linguistic settings to unquantified diagnostic risk. Existing multilingual evaluations rely on translated research-style benchmarks that fail to capture authentic clinical work. We aimed to characterise the patient safety consequences of multilingual medical AI deployment in real-world clinical settings and to develop an auditable detection method for unsafe outputs. Methods We evaluated different language models (LLMs) and visual language models (VLMs) across four real-world clinical tasks (conversational QA, radiology report generation, glaucoma diagnosis, ICU re-intubation prediction) in five languages (English, Chinese, Malay, Thai, Persian). We developed a token-level uncertainty toolkit to localise reasoning instability, compared three inference paradigms (native-language, English chain-of-thought, back-translation pivot), and conducted a prospective study (50 dialogues, 150 physician-reviewed records). Findings LLM/VLM performance degraded consistently from high- to low-resource languages across all tasks. Key gaps included: HealthBench score declining from 0.3743 to 0.3180; radiology macro-F1 from 0.2938 to 0.2149-0.2424, consistent with selective pathology suppression; glaucoma accuracy from 50.7% to 32.7%; ICU parameter recall from 100.0% to 48.5%. Multimodal inputs amplified degradation. Qwen3 VL 235B showed attenuated decline with no resource-ordered pattern in glaucoma classification. Token-level analysis localised instability to mid-chain stages (40-70% of the normalised trajectory); perplexity-based confidence failed to flag errors (AUROC 0.41-0.66). Back-translation pivot consistently restored performance. In the prospective study, 98.7% of records required physician edits (overall modification score 53.6%); Thai-pivot correction burden (59.0%) exceeded English-pivot (50.7%, p=0.003) and Chinese-direct (51.0%, p=0.004). Interpretation Multilingual deployment produced clinically consequential failures, including missed pathology, distorted physiological extraction, and amplified multimodal misclassification, that were invisible to monolingual validation and not reliably flagged by model confidence. Pretraining data composition may contribute to multilingual safety risk. Language-specific safety auditing should precede deployment in non-dominant-language healthcare settings; the open-source detection toolkit enables this without model retraining.
Almotah, K.; Tran, U.; Schweickart, R. A.; Gilbert, H.; Fisher, R. C.; Bisikalo, Y.; Ali, M.; Buhaya, M.; Cheng, M.; Cruise, M.; Chi, Z.; Sarvestani, S. K.; Huang, E. H.; Wessely, O.
Show abstract
ABSTRACTUlcerative colitis is a chronic inflammatory bowel disease that can progress from dysplasia to cancer. Inflammatory responses are critical drivers in this process, typically triggered by epithelial lesions and the ensuing infiltration of microbiota into the interstitial layer. Here, we focus on the pro-inflammatory state of the interstitial fibroblasts, which promotes immune infiltration and augments disease progression. The study aims to provide a mechanistic link how fibroblasts of the colitis-associated microenvironment integrate inflammatory signals, microbial infiltration and cellular memory. To this end, we investigated a large number of primary colon fibroblasts obtained from normal, colitis and colon cancer samples using a range of in vitro approaches and an in vivo co-inoculation cancer model. mRNA sequencing analysis identified that the disease-associated fibroblasts are exhibit a cellular inflammatory status, which involves the injury-induced senescence pathway. Using CXCL8, a potent chemokine upregulated in colitis and cancer colon fibroblasts, as a paradigm, this inflammatory status is triggered by the activation of the NF{kappa}B signaling via immune-derived cytokines (TNF, IL-1{beta}), bacterial signals (LPS) and the microbiome itself using mycoplasma as a paradigm. Finally, iPSC reprogramming studies indicate that fibroblasts from ulcerative colitis retain an epigenetic memory that sustains elevated CXCL8 expression. Together, our findings demonstrate that the senescence associated secretory phenotype of colon fibroblasts is a robust indicator for inflammation-driven colon tumorigenesis.
Faghih, M.; Damm, M.; Kassik, M.-T.; Cheesman, L.; Rauschenberg, S.; Olesen, S. S.; Laheru, D. A.; Zheng, L.; Phillips, A. E.; Yadav, D.; Drewes, A. M.; Rosendahl, J.; Singh, V. K.; International Pancreatic Pain Consortium,
Show abstract
Pain in pancreatic ductal adenocarcinoma (PDAC) is associated with poor survival, but whether altered pain processing carries prognostic significance is unknown. We analyzed a prospective cohort of 143 patients with PDAC who underwent pancreatic quantitative sensory testing (PQST) after diagnosis. Patients were classified as having normal pain processing (n=84), segmental hyperalgesia (n=30), or widespread hyperalgesia (n=29). Survival was measured from the date of P-QST assessment. During follow-up, 70 deaths occurred. Widespread hyperalgesia was associated with increased mortality in unadjusted Cox analysis (HR 1.96, 95% CI 1.14,3.35) and after adjustment for age, sex, tumor stage, comorbidity, opioid treatment, and body mass index (adjusted HR 2.33, 95% CI 1.30,4.15). Segmental hyperalgesia was not associated with mortality. Kaplan Meier analysis demonstrated lower survival probability in the widespread hyperalgesia group (log rank p=0.025). These findings suggest that widespread hyperalgesia, reflecting altered central pain processing, identifies a subgroup of PDAC patients at increased risk of mortality independent of conventional clinical factors.
Guillot, J.; Miao, B.; Suresh, A.; Sushil, M.; Williams, C. Y.; Vashisht, R.; Oskotsky, T. T.; Sirota, M.; Butte, A. J.
Show abstract
Chimeric Antigen Receptor T-cell (CAR-T) therapy, where genetically engineered patient T cells target tumor antigens, has transformed care for hematologic malignancies but requires careful tracking of adverse events (AEs) often documented only in unstructured EHR notes. We evaluated a Large Language Model (LLM)-based approach in UCSFs secure environment to extract AEs, dates, grades, and interventions within 30 days post-infusion for six commercial CAR-T products (2012-2023), benchmarking against two evaluators. Using GPT-4-0314 in a zero-shot setting with four prompts (prespecified AEs, non-prespecified AEs, CRS, ICANS), we compared outputs against dual annotations on a random sample of 50 notes using accuracy, precision, recall, F1, and Cohens kappa. From 4,762 progress notes for 293 patients (median age 65.6), CRS occurred in 80.2% (median onset 4 days); neutropenia 70.0% (16 days); neutropenic fever 64.8% (4 days); ICANS in 34.8%. Interventions included tocilizumab and corticosteroids. Grades were frequently undocumented (CRS 62.3%, ICANS 56.1%); documented cases were mainly CRS grade 1 (59.4%) and ICANS grade 2 (28.0%). Performance was high on CRS and ICANS grading (accuracy of 0.97 and 0.91, respectively). Moderate performances were assessed for prespecified AE extraction (accuracies 0.62-0.76), and non-prespecified AEs (accuracies 0.76-0.84). Inter-rater reliability was strong to near-perfect for CRS/ICANS presence and grade (kappa 0.86-0.96), moderate for dates and interventions, and weaker for broader AE attributes. LLM-derived insights can augment AE monitoring and real-world evidence generation by unlocking unstructured clinical detail and characteristic timelines after CAR T. However, performance varied for broader AE attributes, warranting cautious use. Performance was highest for detecting the presence and grade of CRS and ICANS, with strong to near-perfect inter-rater reliability. While cautious use of LLMs for broad AE extraction is warranted due to the variable performance observed in this study, these results support integrating high-performing CRS/ICANS extraction into EHR workflows. Author summaryChimeric Antigen Receptor T-cell (CAR-T) therapy has transformed care for blood cancer but requires careful tracking of adverse events (AEs). We asked whether a large language model could read routine clinical notes and extract AEs after CAR T-cell therapy. We analyzed de-identified notes from the first month after infusion. The model identified when two key side effects occurred--cytokine release syndrome (a whole-body inflammatory reaction) and neurotoxicity (brain and nerve symptoms)--and how severe they were, with accuracy similar to human reviewers. It also captured when side effects started and what treatments were given, though performance was more variable for the wider range of side effects beyond these two. In our data, these reactions often arose within the first week; blood count problems and infections were also common. Because many notes did not state severity explicitly, the model sometimes could not assign a grade. Our findings suggest that language models can help unlock important details hidden in clinical notes and could be incorporated into electronic records to support faster, more reliable side-effect monitoring and research. We recommend careful, supervised use and continued validation, especially for broader side-effect categories.